Chain-of-Thought (CoT) prompting can dramatically improve the multi-step reasoning abilities of large language models (LLMs). CoT explicitly encourages the LLM to generate intermediate rationales for solving a problem, by providing a series of reasoning steps in the demonstrations. Despite its success, there is still little understanding of what makes CoT prompting effective and which aspects of the demonstrated reasoning steps contribute to its performance. In this paper, we show that CoT reasoning is possible even with invalid demonstrations - prompting with invalid reasoning steps can achieve over 80-90% of the performance obtained using CoT under various metrics, while still generating coherent lines of reasoning during inference. Further experiments show that other aspects of the rationales, such as being relevant to the query and correctly ordering the reasoning steps, are much more important for effective CoT reasoning. Overall, these findings both deepen our understanding of CoT prompting, and open up new questions regarding LLMs' capability to learn to reason in context.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
我们提出了Tacobot,这是为首届Alexa Prive Taskbot Challenge构建的面向任务的对话系统,该系统可帮助用户完成多步骤烹饪和家庭装修任务。Tacobot的设计采用以用户为中心的原则,并渴望提供协作且易于访问的对话体验。为此,它具有准确的语言理解,灵活的对话管理和引人入胜的响应生成。此外,Tacobot还以强大的搜索引擎和自动化的端到端测试套件为支持。在引导Tacobot的开发中,我们探索了一系列数据增强策略,以训练先进的神经语言处理模型,并通过收集的真实对话不断改善对话经验。在半决赛结束时,Tacobot的平均评分为3.55/5.0。
translated by 谷歌翻译
逆合合成是一种将分子转化为潜在反应物的过程,因此鉴定了合成途径。我们提出了一个新颖的生成框架,称为$ \ mathsf {g^2retro} $,用于一步回曲预测。 $ \ mathsf {g^2retro} $模仿合成反应的反向逻辑,也就是说,首先预测反应中心以将靶分子转换为名为合成的片段,然后将合成剂转化为反应剂,然后按照先前的基于半电压的方法转换为反应剂。在预测反应中心时,$ \ mathsf {g^2retro} $定义了一组全面的反应中心类型,并通过考虑多个反应中心候选者来实现预测反应的多样性。在完成合成子时,$ \ mathsf {g^2retro} $部署了一系列子结构附件,以将合成物转换为反应物,该反应物利用了要完成的合成结构的最新结构的整体视图,以及所有所涉及的合成物和所有合成的结构产品结构。在这里,我们证明$ \ mathsf {g^2retro} $能够更好地对基准数据集中最可能的反应物进行优先级,而不是最先进的方法,并且发现了不包括在该方法中基准数据集。
translated by 谷歌翻译
The strong few-shot in-context learning capability of large pre-trained language models (PLMs) such as GPT-3 is highly appealing for application domains such as biomedicine, which feature high and diverse demands of language technologies but also high data annotation costs. In this paper, we present the first systematic and comprehensive study to compare the few-shot performance of GPT-3 in-context learning with fine-tuning smaller (i.e., BERT-sized) PLMs on two highly representative biomedical information extraction tasks, named entity recognition and relation extraction. We follow the true few-shot setting to avoid overestimating models' few-shot performance by model selection over a large validation set. We also optimize GPT-3's performance with known techniques such as contextual calibration and dynamic in-context example retrieval. However, our results show that GPT-3 still significantly underperforms compared to simply fine-tuning a smaller PLM. In addition, GPT-3 in-context learning also yields smaller gains in accuracy when more training data becomes available. Our in-depth analyses further reveal issues of the in-context learning setting that may be detrimental to information extraction tasks in general. Given the high cost of experimenting with GPT-3, we hope our study provides guidance for biomedical researchers and practitioners towards more promising directions such as fine-tuning small PLMs.
translated by 谷歌翻译
双链DNA断裂(DSB)是一种DNA损伤的形式,可导致异常染色体重排。基于高吞吐量实验的最近技术具有明显的高成本和技术挑战。因此,我们设计了一种基于图形的神经网络的方法来预测DSB(GraphDSB),使用DNA序列特征和染色体结构信息。为了提高模型的表达能力,我们引入跳跃知识架构和几种有效的结构编码方法。结构信息对DSB预测的贡献是通过来自正常人体表皮角蛋白细胞(NHEK)和慢性髓性白血病细胞系(K562)的数据集的实验验证,并且消融研究进一步证明了所提出的设计部件的有效性GraphDSB框架。最后,我们使用GNNExplainer分析节点特征和拓扑到DSB预测的贡献,并证明了5-MER DNA序列特征和两种染色质相互作用模式的高贡献。
translated by 谷歌翻译
长篇小说(LSG)是自然语言处理中的垂灾目标之一。与大多数文本生成任务不同,LSG要求基于更短的文本输入输出丰富的内容的长话,并且通常存在信息稀疏性。在本文中,我们提出了\ emph {topnet}来缓解这个问题,通过利用神经主题建模的最新进步来获得高质量的骨架词来补充短输入。特别是,而不是直接生成故事,首先学会将简短的文本输入映射到低维主题分布(由主题模型预先分配)。基于此潜在主题分布,我们可以使用主题模型的重建解码器来对与故事的骨骼相互相关的单词序列。两个基准数据集的实验表明,我们的框架在骨架词选择中非常有效,在自动评估和人类评估中显着优于最先进的模型。
translated by 谷歌翻译
我们研究了联合视频和语言(VL)预培训,以实现跨模型学习和益处丰富的下游VL任务。现有的作品要么提取低质量的视频特征或学习有限的文本嵌入,但忽略了高分辨率视频和多样化的语义可以显着提高跨模型学习。在本文中,我们提出了一种新的高分辨率和多样化的视频 - 语言预训练模型(HD-VILA),用于许多可视任务。特别是,我们收集具有两个不同属性的大型数据集:1)第一个高分辨率数据集包括371.5k小时的720p视频,2)最多样化的数据集涵盖15个流行的YouTube类别。为了启用VL预培训,我们通过学习丰富的时空特征的混合变压器联合优化HD-VILA模型,以及多峰变压器,用于强制学习视频功能与多样化文本的交互。我们的预训练模式实现了新的最先进的导致10 VL了解任务和2个新颖的文本到视觉生成任务。例如,我们以零拍摄MSR-VTT文本到视频检索任务的相对增加38.5%R @ 1的相对增长,高分辨率数据集LSMDC为53.6%。学习的VL嵌入也有效地在文本到视觉操纵和超分辨率任务中产生视觉上令人愉悦和语义相关结果。
translated by 谷歌翻译
过去几年的技术创新的巨大浪潮,标志着AI技术的进展,是深刻的重塑行业和社会。然而,在路上,一个关键的挑战等待着我们,即我们满足快速增长的情景的能力的能力受到收购培训数据的成本的严重限制。由于主流学习范式的局限性,这一困难的局面是基于主流学习范式的局限性:我们需要根据大量注释的数据以及通常从头来训练每个新场景的新模型。在解决这一基本问题时,我们超越并开发一个名为实习生的新学习范式。通过在多个阶段的来自多个来源的监控信号学习,培训的模型将产生强大的相互性。我们在26个众所周知的数据集中评估我们的模型,该数据集涵盖计算机视觉中的四类任务。在大多数情况下,我们的模型仅适用于目标域中的培训数据的10%,始终以完整的数据培训的对应物,通常由显着的边距。这是一个重要前景的重要一步,其中具有一般视觉能力的这种模型可以大大降低对数据的依赖,从而加速通过AI技术的采用。此外,围绕我们的新范式旋转,我们还介绍了一个新的数据系统,新的架构和新的基准,以及一起形成一般愿景生态系统,以开放和包容性的方式支持其未来的发展。
translated by 谷歌翻译
用户表示对于在工业中提供高质量的商业服务至关重要。最近普遍的用户表示已经获得了许多兴趣,我们可以摆脱训练每个下游应用程序的繁琐工作的繁琐工作。在本文中,我们试图改善来自两个观点的通用用户表示。首先,提出了一种对比的自我监督学习范式来指导代表模型培训。它提供了一个统一的框架,允许以数据驱动的方式进行长期或短期兴趣表示学习。此外,提出了一种新型多息提取模块。该模块介绍了兴趣字典以捕获给定用户的主要兴趣,然后通过行为聚合生成其兴趣的面向的表示。实验结果证明了学习用户陈述的有效性和适用性。
translated by 谷歌翻译